Dynamically Maintaining Duplicate-Insensitive and Time-Decayed Sum Using Time-Decaying Bloom Filter

نویسندگان

  • Yu Zhang
  • Hong Shen
  • Hui Tian
  • Xianchao Zhang
چکیده

The duplicate-insensitive and time-decayed sum of an arbitrary subset in a stream is an important aggregation for various analyses in many distributed stream scenarios. In general, precisely providing this sum in an unbounded and high-rate stream is infeasible. Therefore, we target at this problem and introduce a sketch, namely, time-decaying Bloom Filter (TDBF). The TDBF can detect duplicates in a stream and meanwhile dynamically maintain decayed-weight of all distinct elements in the stream according to a user-specified decay function. For a query for the current decayed sum of a subset in the stream, TDBF provides an effective estimation. In our theoretical analysis, a provably approximate guarantee has been given for the error of the estimation. In addition, the experimental results on synthetic stream validate our theoretical analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time-decaying Sketches for Robust Aggregation of Sensor Data

We present a new sketch for summarizing network data. The sketch has the following properties which make it useful in communication-efficient aggregation in distributed streaming scenarios, such as sensor networks: the sketch is duplicate insensitive, i.e., reinsertions of the same data will not affect the sketch and hence the estimates of aggregates. Unlike previous duplicate-insensitive sketc...

متن کامل

Dynamically Adaptive Count Bloom Filter for Handling Duplicates in Data Stream

Identifying and removing duplicates in Data Stream applications is one of the primary challenges in traditional duplicate elimination techniques. It is not feasible in many streaming scenarios to eliminate precisely the occurrence of duplicates in an unbounded data stream. However, existing variants of the Bloom filter cannot support dynamic in both filter and counter together. In this paper we...

متن کامل

An Approximate Duplicate-Elimination in RFID Data Streams Based on d-Left Time Bloom Filter

Article history: Received 6 March 2010 Received in revised form 16 July 2011 Accepted 18 July 2011 Available online 31 July 2011 The RFID technology has been applied to a wide range of areas since it does not require contact in detecting RFID tags. However, due to the multiple readings in many cases in detecting an RFID tag and the deployment of multiple readers, RFID data contains many duplica...

متن کامل

Advanced Bloom Filter Based Algorithms for Efficient Approximate Data De-Duplication in Streams

Data intensive applications and computing has emerged as a central area of modern research with the explosion of data stored world-wide. Applications involving telecommunication call data records, web pages, online transactions, medical records, stock markets, climate warning systems, etc., necessitate efficient management and processing of such massively exponential amount of data from diverse...

متن کامل

A New Memory Efficient Technique for Fraud Detection in Web Advertising Networks

The advertising network considered as the middle man in web advertising between advertisers and publishers. This paper presented an intelligent and memory efficient Fraud detection technique with intelligent classification engine to be used by the advertising networks to scan clicks and impressions offline streams happen on publisher side for the purpose of detecting click fraud and impression ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009